Scaling Distributed File Systems in Resource-Harvesting Datacenters

نویسندگان

  • Pulkit A. Misra
  • Iñigo Goiri
  • Jason Kace
  • Ricardo Bianchini
چکیده

Datacenters can use distributed file systems to store data for batch processing on the same servers that run latencycritical services. Taking advantage of this storage capacity involves minimizing interference with the co-located services, while implementing user-friendly, efficient, and scalable file system access. Unfortunately, current systems fail one or more of these requirements, and must be manually partitioned across independent subclusters. Thus, in this paper, we introduce techniques for automatically and transparently scaling such file systems to entire resource-harvesting datacenters. We create a layer of software in front of the existing metadata managers, assign servers to subclusters to minimize interference and data movement, and smartly migrate data across subclusters in the background. We implement our techniques in HDFS, and evaluate them using simulation of 10 production datacenters and a real 4k-server deployment. Our results show that our techniques produce high file access performance, and high data durability and availability, while migrating a limited amount of data. We recently deployed our system onto 30k servers in Bing’s datacenters, and discuss lessons from this deployment.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

CPU Frequency Scaling Algorithm for Energy-saving in Cloud Data Centers

High energy consumption becomes an urgent problem in cloud datacenters. Based on virtualization technologies, the pay-as-you-go resource provision paradigm has become a trend. Specifically, Virtual Machine (VM) is the basic resource unit in data center for resource migration and provisioning. Many researches have been devoted to improve datacenter resource utilization and reduce power consumpti...

متن کامل

Efficient Workload and Resource Management in Datacenters by Hong Xu

E cient Workload and Resource Management in Datacenters Hong Xu Doctor of Philosophy Graduate Department of Electrical and Computer Engineering University of Toronto 2013 This dissertation focuses on developing algorithms and systems to improve the e ciency of operating mega datacenters with hundreds of thousands of servers. In particular, it seeks to address two challenges: First, how to distr...

متن کامل

Energy-efficient Data-intensive Computing with a Fast Array of Wimpy Nodes

Large-scale data-intensive computing systems have become a critical foundation for Internet-scale services. eir widespread growth during the past decade has raised datacenter energy demand and created an increasingly large nancial burden and scaling challenge: Peak energy requirements today are a signi cant cost of provisioning and operating datacenters. In this thesis, we propose to reduce th...

متن کامل

A Genetic Based Resource Management Algorithm Considering Energy Efficiency in Cloud Computing Systems

Cloud computing is a result of the continuing progress made in the areas of hardware, technologies related to the Internet, distributed computing and automated management. The Increasing demand has led to an increase in services resulting in the establishment of large-scale computing and data centers, in addition to high operating costs and huge amounts of electrical power consumption. Insuffic...

متن کامل

Distributed VNF Scaling in Large-scale Datacenters: An ADMM-based Approach

Network Functions Virtualization (NFV) is a promising network architecture where network functions are virtualized and decoupled from proprietary hardware. In modern datacenters, user network traffic requires a set of Virtual Network Functions (VNFs) as a service chain to process traffic demands. Traffic fluctuations in Large-scale DataCenters (LDCs) could result in overload and underload pheno...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017